COMPLEXITY CLASSES OF EQUIVALENCE PROBLEMS REVISITED 



LANCE FORTNOW AND JOSHUA A. GROCHOW 



Abstract. To determine if two given lists of numbers are the same set, we would sort both lists 
and see if we get the same result. The sorted list is a canonical form for the equivalence relation 
of set equality. Other canonical forms for equivalences arise in graph isomorphism and its variants, 
and the equality of permutation groups given by generators. To determine if two given graphs are 
cospectral (have the same eigenvalues), however, we compute their characteristic polynomials and 
see if they are the same; the characteristic polynomial is a complete invariant for the equivalence 
relation of cospectrality. This is weaker than a canonical form, and it is not known whether a 
polynomial-time canonical form for cospectrality exists. Note that it is a priori possible for an 
equivalence relation to be decidable in polynomial time without either a complete invariant or 
canonical form. 

Blass and Gurevich ("Equivalence relations, invariants, and normal forms, I and II", 1984) 
ask whether these conditions on equivalence relations - having an FP canonical form, having an 
FP complete invariant, and simply being in P - are in fact different. They showed that this 
question requires non-relativizing techniques to resolve. Here we extend their results, and give new 
connections to probabilistic and quantum computation. 

We denote the class of equivalence problems in P by PEq, the class of problems with complete 
FP invariants Ker, and the class with FP canonical forms CF; CF C Ker C PEq, and we ask whether 
these inclusions are proper. If x ~ y implies \y\ < poly(\x\), we say that ~ is polynomially bounded; 
we denote the corresponding classes of equivalence relation CF P , Ker p , and PEq p . Our main results 
are: 

• If CF = PEq then NP = UP = RP and thus PH = BPP; 

• If CF = Ker then NP = UP, PH = ZPP NP , inte gers can be factored in probabilistic polynomial 
time, and deterministic collision-free hash functions do not exist; 

• If Ker = PEq then UP C BQP; 

• There is an oracle relative to which CF 7^ Ker 7^ PEq; and 

• There is an oracle relative to which CF P = Ker p and Ker 7^ PEq. 

Attempting to generalize the third result above from UP to NP leads to a new open question 
about the structure of witness sets for NP problems (roughly: can the witness sets for an NP- 
complete problem form an Abelian group?). We also introduce a natural notion of reduction 
between equivalence problems, and present several open questions about generalizations of these 
concepts to the polynomial hierarchy, to logarithmic space, and to counting problems. 



1. Introduction 

Equivalence relations and their associated algorithmic problems arise throughout mathematics 
and computer science. Examples run the gamut from trivial — decide whether two lists contain 
the same set of elements — to undecidable — decide whether two finitely presented groups are 
isomorphic [Nov55t lBoo57j . Some examples are of great mathematical importance, and some are 
of great interest to complexity theorists, such as graph isomorphism (GI). 

Complete invariants are a common tool for finding algorithmic solutions to equivalence problems. 
Normal or canonical forms — where a unique representative is chosen from each equivalence class 
as the invariant of that class — are also quite common, particularly in algorithms for GI and 
its var iants |HT721 IBL831 IFSS831 IMil801 IBGM82] . More recently, Agrawal and Thierauf [ATOOl 
ThiOO] used a randomized canonical form to show that Boolean formula non-isomorphism (FT) is 
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in AM NP . More generally, the book by Thierauf [Thi00| gives an excellent overview of equivalence 
and isomorphism problems in complexity theory. 

Many efficient algorithms for special cases of GI have been upgraded to canonical forms or 
complete invariants. Are these techniques necessary for an efficient algorithm? Are these techniques 
distinct? Gary Miller |Mil8 0] pointed out that GI has a polynomial-time complete invariant iff it 
has a polynomial-time canonical form (see also [Gur97]). The general form of this question is 
central both in [BG84aj IBG84b] and here: are canonical forms or complete invariants necessary for 
the efficient solution of equivalence problems? 

In 1984, Blass and Gurevich [BG84aj IBG84bj introduced complexity classes to study these algo- 
rithmic approaches to equivalence problems. Although we came to the same definitions and many 
of the same results independently, this work can be viewed partially as an update and a follow-up 
to their papers in light of the intervening 25 years of complexity theory. The classes UP, RP, and 
BQP, the function classes NPMV (multi-valued functions computed by NP machines) and NPSV 
(single- valued functions computed by NP machines), and generic oracle (forcing) methods feature 
prominently in this work. 

Blass and Gurevich }BG84al lBG84b] introduced the following four problems and the associated 
complexity classes. Where they use "normal form" we say "canonical form," though the terms are 
synonymous and the choice is immaterial. We also introduce new notation for these complexity 
classes that makes the distinction between language classes and function classes more explicit. For 
an equivalence relation R C £* x £*, they defined: 

The recognition problem: given x,y € E*, decide whether x ~r y. 

The invariant problem: for x £ £*, calculate a complete invariant f(x) for R, that is, a function 
such that x ~ R y iff f(x) = f{y). 

The canonical form problem: for x € E* calculate a canonical form fix) for R, that is, a function 
such that x ~# f(x) for all x € £*, and x ~r y implies f(x) = f{y). 

The first canonical form problem: for x € £*, calculate the first y S £* such that y ~r x. Here, 
"first" refers to the standard length-lexicographic ordering on £*, though any ordering that can be 
computed easily enough would suffice. 

The corresponding polynomial-time complexity classes are defined as follows: 

Definition 1.1. PEq consists of those equivalence relations whose recognition problem has a 
polynomial-time solution. Ker(FP) consists of those equivalence relations that have a polynomial- 
time computable complete invariant. CF(FP) consists of those equivalence relations that have a 
polynomial-time canonical form. LexEqFP consists of those equivalence relations whose first canon- 
ical form is computable in polynomial time. 

We occasionally omit the "FP" from the latter three classes. It is obvious that 

LexEq CCFC Ker C PEq, 

and our first guiding question is: which of these inclusions is tight? 

Blass and Gurevich showed that none of the four problems above polynomial-time Turing-reduces 
(Cook-reduces) to the next in line. We extend their results using forcing, and we also give fur- 
ther complexity-theoretic evidence for the separation of these classes, giving new connections to 
probabilistic and quantum computing. Our main results in this regard are: 

Proposition I4T101 If CF = Ker then integers can be factored in probabilistic polynomial time. 

Proposition 14.111 If CF = Ker then collision-free hash functions that can be evaluated in deter- 
ministic polynomial time do not exist. 

Theorem Q //Ker = PEq then UP C BQP. //CF = PEq then UP C RP. 
We also show the following two related results: 
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Corollary Q If CF = Ker then NP = UP and PH = ZPP NP . 

Corollary 131 // CF = PEq then NP = UP = RP and in particular, PH = BPP. 

Corollary 14.21 follows from the slightly stronger Theorem 14. H but we do not give the statement here 
as it requires further definitions. 

The remainder of the paper is organized as follows. In Section [2] we give preliminary definitions 
and background. In Section Owe review the original results of Blass and Gurevich [BG84a, BG84b]. 
We also combine their results with other results that have appeared in the past 25 years to yield 
some immediate extensions. In Section 14.11 we prove new results connecting these classes with 
probabilistic and quantum computation. In Section 14,1.11 we introduce a group-like condition on 
the witness sets of NP-complete problems that would allow us to extend the first half of Theorem l4.3l 
from UP to NP, giving much stronger evidence that Ker ^ PEq. We believe the question of whether 
any NP-complete sets have this property is of independent interest: a positive answer would provide 
nontrivial quantum algorithms for NP problems, and a negative answer would provide further 
concrete evidence for the lack of structure in NP-complete problems. In Sections 14.2. II and 14.2.21 we 
discuss connections with integer factoring and collision- free hash functions, respectively. In Section 
14.2. 5l we introduce a notion of reduction between equivalence relations and the corresponding notion 
of completeness. In Section we update and extend some of the oracle results of [BG84al lBG84b| 
using forcing techniques. In the final section we mention several directions for further research, in 
addition to the several open questions scattered throughout the paper. 

2. Preliminaries 

We assume the reader is familiar with standard complexity classes such as P, NP, BPP, and the 
polynomial hierarchy PH = (J Z^P = (J n^P = |J A^P. We refer the reader to the Complexity Zoo 
at http://qwiki.stanford.edu/wiki/Complexity_Zoo for more details. 

A language L is in the class UP if there is a nondeterministic machine deciding L that has at 
most one accepting path on each input. 

The class BQP consists of those languages that can be decided on a quantum computer in poly- 
nomial time with error strictly bounded away from 1/2. For more details on quantum computing, 
we recommend the book by Nielson and Chuang [NCOO] . 

2.1. Function Classes. Complexity-bounded function classes are defined in terms of Turing trans- 
ducers. A transducer only outputs a value if it enters an accepting state. In general, then, a 
nondeterministic transducer can be partial and/or multi- valued. For such a function /, we write 

set-f(x) = {y : some accepting computation of / outputs y} 

The domain of a partial multi-valued function is the set 

dom(/) = {x : set-f(x) + 0}. 

The graph of a partial multi- valued function is the set 

graph(/) = {(x,y) : y G set-f(x)}. 

The class FP is the class of all total functions computable in polynomial time. The class PF is 
the class of all partial functions computable in polynomial time. Note that machines computing a 
PF function must halt in polynomial time even when they make no output. 

The class NPSV consists of all single- valued partial functions computable by a nondeterministic 
polynomial-time transducer. Note that multiple branches of an NPSV transducer may accept, but 
they must all have the same output. The class NPMV consists of all multi- valued partial functions 
computable by a nondeterministic polynomial-time transducer. The classes NPSV t and NPMV t are 
the subclasses of NPSV and NPMV, respectively, consisting of the total functions in those classes. 
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The classes NPSV g and NPMV g are the subclasses of NPSV and NPMV, respectively, whose graphs 
are in P. 

A refinement of a multi-valued partial function / is a multi-valued partial function g such that 
dom(g) = dom(/) and set-g{x) C set-f(x) for all x. In particular, if set-f(x) is nonempty then so 
is set-g(x). If T\ and are two classes of partial multi- valued functions, then 

T\ T 2 

means that every function in T\ has a refinement in Ti- 
ll is known that NPMV C c PF iff P = NP [Sel92] iff NPSV C PF [SXB83] . Selman [Sel94j is one 
of the classic works in this area, and gives many more results regarding these function classes. 

2.2. Equivalence Relations. For an equivalence relation R C S* x £*, we write x ~# y if 
(x, y) G R. We write [x]r for the /^-equivalence class of x. The kernel of a function / is the 
equivalence relation Ker(/) = {(x,y) : f(x) = f(y)}. For an equivalence relation R, if R = Ker(/), 
we say that / is a complete invariant for R. If, furthermore, x ~_r f{x) for every x, then / is a 
canonical form for R. If, further still, /(x) is the first member of [x]r under lexicographic order, 
we say that / is the first canonical form for R. The trivial relation is all of S* xS*, that is, all 
strings are equivalent under the trivial relation, or equivalently [x] = X* for all x. 

An equivalence relation is length-restricted if x ~ y implies |x| = \y\. An equivalence relation 
is polynomially bounded if there is a polynomial p such that x ~ y implies \x\ < p(\y\)- Note that 
the first canonical form for a polynomially bounded equivalence relation is a polynomially honest 
function. If C is a class of equivalence relations, we write C= for the class of length-restricted 
equivalence relations in C, and C p for the class of polynomially bounded equivalence relations in C. 

2.3. Generic Oracles and Forcing. In Section [5] we will use generic oracles for various notions 
of genericity. Here we give a brief overview of the definitions and basic results on generic oracles; 
for a more in-depth discussion, see [FFKL03J. 

Throughout this section we will use a to represent a string in £*, O or X to represent oracles, a, 
t, and 7 to represent conditions (collections of oracles), and Q to represent a notion of genericity 
(a collection of conditions). All these terms are defined below. 

Informally, a condition is a collection 7 of oracles in which it is possible to carry out any oracle 
construction that proceeds by finite extensions. If an oracle O G 7, we may think of O as "satisfying 
7." Formally: 

Definition 2.1. A condition is a nonempty perfect subset 7 of 2 s * . In other words, it is a collection 
of oracles such that if O G 7, and <r is any finite initial segment of O, then there are two (and hence 
infinitely many) distinct oracles in 7 extending a. 

If 7 and r are two conditions, and 7 C r, we say that 7 extends r. Although this terminology 
at first may seem backwards, it is used because a smaller set of conditions more fully specifies an 
oracle. For example, if r is the set of all oracles with initial segment 010 and 7 is the set of all oracles 
with initial segment 010001011, then 7 extends r, and 7 more fully specifies an oracle. Moreover, 
as in the preceding example, this terminology is consistent with the usage for finite strings: we say 
that 010001011 extends 010 and that 7 extends r. 

Informally, a notion of genericity is a collection Q of conditions in which it is possible to carry 
out a forcing construction (which we'll explain shortly). Formally: 

Definition 2.2. A notion of genericity is a nonempty set Q of conditions such that, for all 7 G Q, 
all O G 7, and all a G £*, there is a condition 7' G Q such that 7' C 7 and (VO' G 7')[0'(a) = 0(a)}. 
The conditions of are called Q -conditions. 

Here we define a very restricted notion of forcing, as it suffices for our purposes. A more general 
definition of forcing is given in [FFKL03J. 
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Definition 2.3 (Forcing). A condition 7 forces X(a) = if X(a) = for all X G 7. In this case, 
we write 7 IF X(a) = 0. Similarly for X(a) = 1. If 7 forces either X{a) = or X(a) = 1, we say 
that 7 forces -X"(a), or forces the value of a. 

A generic oracle is simply one built by a forcing construction. This notion is formalized in the 
definition of a generic filter: 

Definition 2.4 (Generic Filter). If Q is a notion of genericty, a generic filter over Q is a subset 
GC^ such that 

(1) For all a, r G £/, if r € G and tC<j then cr € G (i. e., upward closure), 

(2) For all a%, o~i € G, there is a r € G such that r C <n D 02 (i. e., every two condition in G 
have a common extension) 

(3) For each a G £*, there is a 7 G G such that 7 forces the value of a. 

Conditions (1) and (2) are the definition of a filter (which is used more broadly in logic and 
topology). 

In particular, condition (2) implies that any two conditions in a filter G have nonempty intersec- 
tion. By the compactness of Cantor space 2 s * , this means that the intersection of all conditions in 
G is nonempty: f]G 7^ 0. Furthermore, condition (3), the forcing condition, implies that for every 
a G £*, there is a condition 7 G G forcing the value of a, i. e., forcing X(a) = b where b G {0, 1}. 
If G G P) G, then G G 7, so G{a) = b. Since the value of G(a) is forced for all a, G is uniquely 
determined by G. Hence f] G consists of a single oracle. 

Definition 2.5 (Generic Oracle). Let Q be a notion of genericity, and G is a generic filter over Q, 
and let P| G = {O}. Then we say that G builds O, and that O is a Q-generic oracle. 

Lemma 2.6 (Existence of (/-generic oracles). For every notion of genericity Q, the set of Q-generic 
oracles is dense in Q. In other words, for every Q-condition 7, there is a Q-generic oracle O G 7. 

A Cohen condition is a condition 7 specified by a partial characteristic function with finite 
domain. In other words, if A and B are disjoint finite sets of strings, then the set of all oracles 
including A and excluding B, i.e., {X : (Va G A)[X(a) = 1] and (V6 G B)[X(b) = 0]}, is a Cohen 
condition. This gives rise to the notion of Cohen genericity. 

A UP condition is a Cohen condition 7 that includes at most one string of each length, and 
only includes strings of length tower (k), for any k. The tower function is defined inductively by 
tower(0) = 1 and tower (n) = 2 tower ( n ~ 1 \ This gives rise to the notion of UP-generics. We use 
both Cohen generics and UP-generics in Section [5j as well as defining a new notion of genericity. 

3. Previous Results 

Here we recall the previous results most relevant to our work. Most of the results in this section 
are from Blass and Gurevich [BG84a, BG84bj. We are not aware of any other prior work in this 
area. However, results in other areas of computational complexity that have been obtained since 
1984 can be used as black boxes to extend their results, which we do here. 

If R G PEq, then the language R' = {(x,y) : (3z)[z <i ex V and (x,z) G R]} is in NP, and can be 
used to perform a binary search for the first canonical form for R. Hence, PEq C LexEqFP NP . The 
first result shows that this containment is tight: 

Theorem 3.1 ([BG84ji] Theorem 1). There is an equivalence relation R G CF whose first canonical 
form problem is essentially A2P '-complete, that is, it is in FP NP = FA2P and is A2P-hard. □ 

Note that the above proof that PEq C LexEqFP NP relativizes, so all four polynomial-time classes 
of equivalence relations are equal in any world where P = NP, in particular, relative to any PSPACE- 
complete oracle. The next result gives relativized worlds in which Ker 7^ PEq, CF 7^ Ker, and 
LexEq 7^ CF, though these worlds cannot obviously be combined. 
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Theorem 3.2 ([BG84aJ Theorem 2). Of the four equivalence problems defined above, none is Cook 
reducible to the next in line. In particular: 

(a) There is an equivalence relation R £ Ker(FP^), i. e., Ker(FP fl ) ^ P R Eq. 

(b) There is a function f such that Ker(/) £ CF(FP^), i. e., CF(FP^) ^ Ker(FP^). 

(c) There is an idempotent function f such thatKer(f) £ LexEqFP^, i. e., LexEqFP^ 7^ CF(FP^). 

□ 

In addition to several extensions of these results, Blass and Gurevich |BG84a, BG84bJ also show 
that collapses between certain classes of equivalence problems are equivalent to more standard 
complexity-theoretic hypotheses. Perhaps the most surprising of these are the equivalence between 
CF(FP) C LexEqNPSV t and NP = coNP f [BG84bj Thm. 1), and the following result. 

Theorem 3.3 ([BG84b] Theorem 3). The following statements are equivalent: 

(1) Ker(FP) = C CF(NPSV t ) 

(2) NP has the shrinking property 

(3) NPMV C c NPSV, i. e., the uniformization principle holds for NP 
We refer to |BG84b| for the definition of the shrinking property. 

Hemaspaandra, Naik, Ogihara, and Selman |HNOS94j showed that if NPMV C c NPSV then 
SAT £ (NP D coNP) / poly . At the time, they showed that this implied PH = Z2P; shortly therefater 
Kobler and Watanabe [KW95] improved the collapse to PH = ZPP NP . Combined with Theorem l3.3l 
this immediately implies a result that has not been announced previously: 

Corollary 3.4. If CF = Ker then PH = ZPP NP . 

4. Evidence for Separation 

4.1. New Collapses. Blass and Gurevich's |BG84bj proof that Ker(FP) = C CF(NPSV t ) => 
NPMV C c NPSV essentially shows the following slightly stronger result. However, as NPMV C c 
NPSV is not known to imply NPMV g C c NPSV g , our result does not directly follow from their 
result, but only from its proof, the core of which is reproduced here: 

Theorem 4.1. //CF = Ker then NPMV g C c NPSV g . 

Proof. Let / E NPMV g , let M be a nondeterministic polynomial-time transducer computing /, and 
let V be a polynomial-time decider for graph(/). If CF = Ker, then the equivalence relation 

{((x, y), (x, y')) : V(x, y) = V(x, y')} = Ker ((a;, y) i-> (x, V(x, y))) 

has a canonical form c € FP. Then the following machine computes a refinement of / in NPSV g : 
simulate M(x). On each branch, if the output would be y, accept iff c{x,y) = (x,y). Hence 
/ € c NPSV g . ' □ 

Similar to the original result [BG84bJ, we can weaken the assumption of this theorem to 
Ker(FP) p C CF(NPSV t ), without modifying the proof. By padding, we can further weaken the 
assumption to Ker(FP) = C CF(NPSV t ). 

Corollary 4.2. If CF = Ker then NP = UP and PH = ZPP NP . 

Note that Corollary 13.41 alone does not imply Corollary 14.21 as neither of the statements PH = 
ZPP NP and NP = UP is known to imply the other. Indeed, it is still an open question as to whether 
NP = UP implies any collapse of PH whatsoever. 

The next new result we present gives a new connection between complexity classes of equivalence 
problems and quantum and probabilistic computation: 



Theorem 4.3. //Ker = PEq then UP C BQP. //CF = PEq then UP C RP. 
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Proof. Suppose Ker = PEq. Let L be a language in UP and let V be a nondeterminstic polynomial- 
time machine with at most one accepting path for each input, such that x E L (3y)[|y| < 
p(|x|) and V(x,y) = 1] for some polynomial p. Consider the relation 

Rl = {({a, x), (a, y)) : x = y or \x\ = \y\ and V(a, x © y) = 1} 

where © denotes bitwise XOR. Clearly Rl E PEq, so by hypothesis Rl has a complete invariant 
/ E FP. Since L E UP, for each a E L there is a unique string w a such that V(a,w a ) = 1. Define 
/ a (x) = f(a,x). Then for all distinct x and x', f a (x) = f a (x') iff x © x' = w a . Given a and 
f a , and the promise that f a is either injective or two-to-one in the manner described, finding w a 
or determining that there is no such string is exactly Daniel Simon's problem, which is in BQP 
[Sim94j . 

Now suppose further that CF = PEq. Then we may take / to be not only a complete invariant 
but further a canonical form for Rl. On input a, the following algorithm decides L in polynomial 
time with bounded error: for each length I < p(\a\), pick a string x of length I at random, compute 
/((a, x)) = (a, y), and compute V(a, xffiy). If V(a, x(By) = 1 for any length £, output 1. Otherwise, 
output 0. If a ^ L then this algorithm always returns 0. If a E L and CK is a's witness, then the 
algorithm always returns 1. If a E L and is not a's witness, then y ^ x, and hence the answer is 
correct, with probability 1/2. □ 

Corollary 4.4. 7/CF = PEq then NP = UP = RP and in particular, PH = BPP. 

Proof. If CF = PEq then it follows directly from Theorems O and [O] that NP = UPCRP. Thus 
NP = RP, since RP C NP without any assumptions. Furthermore, it follows that PH C BPP 
[Zac88], and since BPP C PH |Lau831 |Sip83| , the two are equal. □ 



The collapse inferred here is stronger than that of Corollary 13.41 since BPP C ZPP NP [Sip83 



ZH86 . However, this result is incomparable to Corollary 13.41 since it also makes the stronger 
assumption CF = PEq, rather than only assuming CF = Ker. 

4.1.1. Groupy witnesses for NP problems. We would like to extend the first half of Theorem 14 . 31 from 
UP to NP to give stronger evidence that Ker ^ PEq, but the technique does not apply to arbitrary 
problems in NP. However, if an NP problem's witnesses satisfy a certain group-like condition, then 
Theorem 14.31 may be extended to that problem. 

Let L E NP and let V be a polynomial-time verifier for L. By padding if necessary, we may 
suppose that for each a E L, a's witnesses all have the same length. Suppose there is a polynomial- 
time length-restricted group structure on £*, that is, a function / E FP such that for each length 

n, E n is given a group structure defined by xy~ l d = f(x,y). Then 

r l = {((«, (a, y)) ■ x = y or V(a, xy' 1 ) = 1} 

is an equivalence relation iff a's witnesses are a subgroup of this group structure, or a subgroup 
less the identity. The technique of Theorem 14.31 then reduces L to the hidden subgroup problem 
over the family of groups defined by /. 

The hidden subgroup problem, or HSP, for a group G is: given generators for G, an oracle 
computing the operation (x, y) i— ► xy~ l , a set X, and a function /: G — > X such that Ker(/) is the 
partition given by the cosets of some subgroup H < G, find a generating set for H |Kit95j . Hidden 
subgroup problems have played a central role in the study of quantum algorithms. Integer factoring 
and the discrete logarithm problem both easily reduce to Abelian HSPs. The first polynomial-time 
quantum algorithm for these problems was discovered by Shor [Sho94] ; Kitaev [Kit95] then noticed 
that Shor's algorithm in fact solves all Abelian HSPs. The unique shortest vector problem for 
lattices reduces to the dihedral HSP |Reg02|, which is solvable in sub exponential quantum time 



|Kup05| . The graph isomorphism problem reduces to the HSP for the symmetric group [Bea97| 



or the wreath product S n I S2 [EH99], but it is still unknown whether any nontrivial quantum 
algorithm exists for GI. 

In addition to the HSP for Abelian groups, the HSPs for several families of non-Abelian groups 
are also in BQP, for example: the class of so-called "almost Abelian groups", which consists of 
groups G where the index of the intersection of all normalizers (~) H<G Ng(H) is small [GSVV04], 
the class of "smoothly solvable" groups, which consists of solvable groups of constant commutator 
length whose Abelian factors Gi can each be expressed as the direct product of a group Hi of 
constant exponent and a group K{ of size log° (1) (|Gi|) jFIM+03] . and the class of groups whose 
commutator subgroup has polylogarithmic size }IMS03j . 

The proof of Theorem 14.31 showed that if Ker = PEq then every language in UP reduces to Daniel 
Simon's problem. We can now see that Simon's problem is in fact the HSP for (Z/2Z) n , where the 
hidden subgroup has order 2. Simon [Sim94j gave a zero-error expected polynomial time quantum 
algorithm for this problem, putting it in ZQP C BQP. This result was later improved by Brassard 
and H0yer [BH97] to a worst-case polynomial time quantum algorithm, that is, in the class EQP 
(sometimes referred to as just QP). 

This discussion motivates the following definition, results, and open question: 

Definition 4.5. Let L E NP. For each a let W(a) denote the set of a's witnesses; without loss 
of generality, by padding if necessary, assume that W(a) C S n for some n. The language L has 
groupy witnesses if there are functions mul, gen, dec E FP such that for each a E L: 

(1) let G(a) = {1 £ S™ : dec(a, x) = 1}; then for all x,y E G(a), defining xy~ x d = mul(a, x, y) 
gives a group structure to G(a); 

(2) gen(a) = (91,52, • • • ,9k) 1S a generating set for G(a); and 

(3) W(a) is a subgroup of G(a), or a subgroup less the identity. 

The following results are corollaries to the proof, rather than to the result, of Theorem 14.31 

Corollary 4.6. If Ker = PEq and a language L E NP has groupy witnesses in a family Q of groups, 
then L Cook-reduces to the hidden subgroup problem for the family Q. Briefly: L <£ HSP{Q). 

Proof. Let L E NP, let W and G be as in the definition of groupy witnesses, and let V be a 
polynomial-time verifier for L such that the witnesses accepted by V on input a are exactly the 
strings in W(a). Then the equivalence relation 

Rl = {{(a, x), (a, y)) : x = y, or V(a, xy~ l ) = 1, or both x e' G(a) and y e' G(a)} 

is in PEq, since membership in G(a) can be decided in polynomial time by the algorithm dec 
guaranteed in the definition of groupy witnesses, and xy~ 1 can be computed by the polynomial- 
time algorithm mul guaranteed in the definition of groupy witnesses. By hypothesis, Rl has a 
complete invariant /. The function /, the function mul, and the generating set gen(a) are a valid 
instance of the hidden subgroup problem. If a ^ L, then / is injective, and the hidden subgroup is 
trivial. If a E L, then the hidden subgroup is W(a). Therefore L reduces to the hidden subgroup 
problem. □ 

Corollary 4.7. If Ker = PEq and the language L has Abelian groupy witnesses, then L E BQP. 

Corollary 4.8. Every language in UP has Abelian groupy witnesses. 

Open Question 4.9. Are there NP-complete problems with Abelian groupy witnesses? Assuming 
P 7^ NP, are there any problems in NP\UP with Abelian groupy witnesses? 

Our definition of having groupy witnesses is similar but not identical to Arvind and Vinodchan- 
dran's definition of group-definability AVOO . In fact, if a set A E NP has Abelian groupy witnesses 
and |G(a)| is computable in time poly(\a\), then their techniques are sufficient to show that A is 
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low for PP, and hence unlikely to be NP-complete. Despite this, it may yet be possible to use 
Corollary 14.61 to show that Ker = PEq ==>■ NP C BQP, as there are several classes of non-Abelian 
groups for which the HSP is known to be in BQP (see, e. g., |GSVV041 lFIM+031 IIMS03Q . 

4.2. Hardness. 

4.2.1. Factoring integers. 

Proposition 4.10. If CF = Ker then integers can be factored in probabilistic polynomial time. 

Proof. Suppose we wish to factor an integer N. We may assume N is not prime, since primality 
can be determined in polynomial time [AKS04| . but even much weaker machinery lets us do so in 
probabilistic polynomial time [SS771 lRab80| . which is sufficient here. By hypothesis, the kernel of 
the Rabin function x i— > x 2 (mod N): 

R N = {( x , y) : x 2 = y 2 (mod N)} 

has a canonical form / € FP. 

Randomly choose x G TLjNTL and let y = f(x). Then x 2 = y 2 (mod N); equivalently, (x — 
y)(x + y) = (mod N). If y ^ ±x (mod N), then since neither x — y nor x + y is = (mod N), 
gcd(A r , x — y) is a nontrivial factor z of N. Let r(N) be the least number of distinct square roots 
modulo N. Then Pr x .[y ^ ±x] > 1 — f^ry- Since N is composite and odd without loss of generality, 

r(N) > 4. Thus Pr x [y ^ ±x] = Pr x [the algorithm finds a factor of N] > \. Recursively call the 
algorithm on N/z. □ 

4.2.2. Collision- free hash functions. Collision-free hash functions are a useful cryptographic primi- 
tive (see, e. g., |BSnP95| ). Proposition 14.101 suggests a more general connection between the collapse 
CF = Ker and the existence of collision- free hash functions. 

A collection of collision-free hash functions is a collection of functions {hi : i E 1} for some 
ICS* where hi : EM +1 -» £1*1 are 

1. Easily accessible: there is an efficient, i. e., probabilistic polynomial-time, algorithm G such 
that G(l n ) G £ n n /; 

2. Easy to evaluate: there is an efficient algorithm E such that E(i,w) = hi(w); and 

3. Collision-free: for all efficient algorithms A and all polynomials p there is a length such 
that n > N implies: 

Pr [x^y and hi(x) = hi(y)} < — 
i=G(i") p(n) 

(x,y)=A{i) 

It is not known whether collections of collision- free hash functions exist, though their existence 
is known to follow from other cryptographic assumptions (see, e.g., |Dam88j ) . Many proposed 
collections of collision-free hash functions, such as MD5 or SHA, can be evaluated deterministically, 
that is, E G FP. 

Proposition 4.11. //CF = Ker then collision-free hash functions that can be evaluated in deter- 
ministic polynomial time do not exist. 

Proof. The equivalence relation {((i,x), (i,y)) '■ E(i,x) = E(i,y)} has a canonical form / G FP by 
hypothesis. As in the proof of Proposition 14. 101 the canonical form / can be used by a randomized 
algorithm to find collisions in hi with non-negligible probability: choose x at random, and if f(x) ^ 
x then a collision has been found. 

Since hi maps £l l l +1 — > £l J l, there are at most 2^ — 1 singleton classes in R = Ker(/i,). If x lies 
in an equivalence class of size at least 2, then Pr x [f(x) ^ x\#[x]r > 2] > \. Thus Pr x [/(x) ^ x] = 

Pr,[/(x) jk x\#[x] R > 2]Pt x [#[x] r > 2] > \ (± + ^) > \. □ 
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4.2.3. Subgroup equivalence. The subgroup equality problem is: given two subsets {<7i, • ■ ■ , <7t}, 
{hi, . . . ,h s } of a group G determine if they generate the same subgroup. The group member- 
ship problem is: given a group G and group elements gi, . . . ,gt,x, determine whether or not 
x £ {9Xi ■ ■ ■ ><?«)• A solution to the group membership problem yields a solution to the subgroup 
equality problem, by determining whether each hi lies in (gx,...,gt) and vice versa. However, 
a solution to the group membership problem does not obviously yield a complete invariant for 
the subgroup equivalence problem. Thus subgroup equivalence problems are a potential source of 
candidates for problems in PEq\Ker. 

Fortunately or unfortunately, the subgroup equivalence problem for permutation groups on 
{l,...,n} has a polynomial-time canonical form, via a simple modification [Bab08] of the clas- 
sic techniques of Sims [Sim70| ISim71] . whose analysis was completed by Furst, Hopcroft, and Luks 
[FHL80] and Knuth [Knufllj . 

4.2.4. Boolean function congruence. Two Boolean functions / and g are congruent if the inputs to / 
can be permuted and possibly negated to make / equivalent to g. If / and g are given by formulae ip 
and ip, respectively, deciding whether <p and ip define congruent functions is Karp equivalent to FI. 
If / and g are given by their truth tables, however, Luks }Luk99j gives a polynomial-time algorithm 
for deciding whether or not they are congruent. Yet no polynomial-time complete invariant for 
Boolean function congruence is known. Hence function congruence may be in PEq\Ker. 

4.2.5. Complete problems? Equivalence problems that are P-complete under NC or L reductions 
may lie in PEq\Ker due to their inherent difficulty. However, we currently have no reason to believe 
that P-completeness is related to complexity classes of equivalence problems. Towards this end, we 
introduce a natural notion of reduction for equivalence problems: 

Definition 4.12. An equivalence relation R kernel-reduces to an equivalence relation S, denoted 
R <f er S, if there is a function / £ FP such that 



Note that R € Ker iff R kernel-reduces to the relation of equality. Also note that if R <£L. S via 
/, then R <^ S via (x,y) 1— > (f(x),f(y)), leading to the question: 

Open Question 4.13. Are kernel reduction are Karp reduction different? Are they different on 



An equivalence relation R € PEq is PEq -complete if every S € PEq kernel-reduces to R. For any 
PEq-complete R, R G Ker iff Ker = PEq iff the relation of equality is PEq-complete. Unlike NP- 
completeness, however, the notion of PEq-completeness does not become trivial if Ker = PEq: the 
relation of equality does not kernel-reduce to the trivial relation. More generally, if R <ker S, then 
S cannot have fewer equivalence classes than R, even without a complexity bound on the reduction; 
a complexity bound further implies a relationship between the densities of the two relations. 

Open Question 4.14. Are there PEq-complete equivalence problems? 



We make significant use of generic oracles for various notions of genericity, i.e., forcing. Similar 
to Fenner, Fortnow, Kurtz, and Li [FFKL03J, when we say 

Let O be an X-generic oracle ... 

it should be read 





5. Oracles 



Let O = A® B where A is PSPACE-complete and B is an A-generic oracle relative 
to A... 
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For some of these results, we will need a new notion of genericity: transitive genericity. A 
transitive condition a is a Cohen condition satisfying 

(1) Length restriction: (x,y) G a only if \x\ = \y\, and 

(2) Transitivity: (x, y) G a and (y, z) G a implies (x, z) G a. 

By Lemma |2.6| transitive generic oracles exist. 

Theorem 5.1. There are oracles A and B relative to which P 7^ NP and 

CF(FP yl ) ^ Ker(FP^) ^ P^Eq, 
CF(FP B ) p = Ker(FP B ) p and Ker(FP B ) ± P B Eq 

We break most of the proof into three lemmata. The proofs of Lemmata 15.31 and 15.41 are adap- 
tations of the proofs in |BG84a] to generic oracles. The proof of Lemma 15.51 is new. 
We start by restating a useful combinatorial lemma: 

Lemma 5.2 ([BG84a] Lemma 1). Let G be a directed graph on 2k vertices such that the out- degree 
of each vertex is strictly less than k. Then there are two nonadjacent vertices in G. 

Lemma 15.21 can be proved by a simple counting argument. 

Lemma 5.3. There is a (transitive generic) oracle relative to which Ker 7^ PEq. 

Proof. Let r be a transitive condition, and let r denote the minimal transitive oracle extending 
r, that is, (a, a) G r for every a G £*, but the only pairs (x,y) G r are those in r. Let M be a 
polynomial-time oracle transducer running in time p(|x|). Let n be a length such that p(n) < 2 n_1 
and r is not defined on (a, b) for any strings a and b of length > n. If there are distinct strings x 
and y of length n such that M r (a;) = M T {x), then extend r to length p(n) as r. Then x ^ T y but 
M r (x) = M T {y). 

Otherwise, M T (x) 7^ M T (y) for every two distinct strings x and y. Say that x affects y if M 
queries r about (x,y) or (y,x) in the computation of M T (y). Let G be a digraph on the strings 
of length n, where there is a directed edge from x to y if x affects y. By the condition on n, the 
out-degree of each vertex is at most 2 n ~ 1 . Since there are 2 n vertices, Lemma [5 . 2 1 implies that there 
are two strings x and y of length n such that neither affects the other. Put (x, y) and (y, x) into r. 
Thus M T (x) ^ M T (y) but x ~ r y. 

Thus there is a transitive generic oracle O such that Ker 7^ PEq relative to O. □ 

Lemma 5.4. There is a (Cohen generic) oracle relative to which CF 7^ Ker. 

Proof. We describe the oracle O over the alphabet {0,1,2} for simplicity. Let read : X* — ► X* 
denote the oracle function 

read°(x) = O(x01)O(x0U) ■ • • CKxOl^ 1 ) 

where k is the least value such that O(x01 k ) = 2. Note that the bits used by read on input x are 
disjoint from those used by read on any input y 7^ x. Let R° = Ker (read ). 

Let r be a Cohen condition, and let r denote the oracle extending r which has value 2 outside 
dom(r). Let M be a polynomial-time oracle transducer running in time Let n be a length 

such that p(n) < 2 n_1 and read T (x) is the empty string e for all strings of length > n. For a string 
x of length n, let denote the minimal extension of r such that read Tx is the identity on all strings 
of length n except read Tx (x) = 1" . Note that read Tx is injective on strings of length n, so its 
kernel at length n is the relation of equality. In particular, any canonical form for R Tx must be the 
identity on strings of length n. 
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If there is an x of length n such that M Tx (x) 7^ x, then M Tx (x) is not the identity on strings of 
length n, so M Tx is not a canonical form for R Tx . Extend r to t x . 

Otherwise, M Tx (x) = x for all x of length n. Find x and y of length n such that M Tx (x) does 
not query the oracle about y and M Ty (y) does not query the oracle about x. This is possible by 
Lemma 15.21 as in the proof of Lemma 15.31 Then update r so that read T (x) = read T (y). Again, 
M T cannot be a canonical form for R T . 

Thus there is a Cohen generic oracle relative to which CF 7^ Ker. □ 

Lemma 5.5. If A is PSP ACE- complete and O has at most one string of each length tower{k) and 
no other strings, then relative to A © O, CF(FP) p = Ker(FP) p . 

Proof. Relativize to a base PSPACE-complete oracle. Let O have at most one string of each length 
tower (k), and no other strings. Let / be an oracle transducer running in polynomial time p(|x|), 
let R = Ker(/°), and suppose that (x,y) € R implies |x| < q(\y\) for some polynomial q. For any 
input x of sufficient length, all elements of O except possibly one have length either < logp(|x|), 
in which case they can be found rapidly, or > (?(p(|x|)) in which case they cannot be queried by / 
on any input y ~ x. Following a technique used in |BF99] . we call this one element the "cookie" 
for this equivalence class. 

For the remainder of this proof, "minimum," "least," etc. will be taken with respect to the 
standard length-lexicographic ordering. 

We show how to efficiently compute a canonical form for R. Let R y denote the inverse image of 
y under / , which is an ^-equivalence class. Let 

By = {x : / (x) = y and / (x) does not query the cookie}, 

r y = min Ry, and b y = mmB y . A canonical form for R is 

(by if By^^ 

g(x) = < 

I r y otherwise, 

where y = f (x). Now we show that g is in fact in FP°. On input x, the computation of g proceeds 
as follows: 

(1) Find all elements of O of length at most logp(|x|). Any further queries to O of length 
< logp(|x|) will be simulated without queries by using this data. 

(2) Compute y = f°{x). 

(3) If the cookie was queried, then all further queries to O will be simulated without queries 
using this data. Using the power of PSPACE, determine whether or not B y = 0. If B y = 0, 
find and output r y . If B y 7^ 0, find and output b y . 

(4) If the cookie was not queried, then x E B y , so B y 7^ 0. Use the power of PSPACE to find 
the least z such that f(z) = y, answering to any queries made by / to strings of length t 
between logp(|x|) < t < g(p(|x|)). 

(5) Run f°(z). If f°{z) = y, then z = b y , so output z. Otherwise, f (z) queried the cookie, 
so no further oracle queries need be made. Using the power of PSPACE, find and output b y . 

□ 

Proof of Theorem \5.1\ (CF ^ Ker 7^ PEq) Let A = Aq © A\ where Aq is transitive generic and 
A\ is Cohen generic. The constructions in the proofs of Lemmata 15.31 and 15.41 go through mutatis 
mutandis. 

(CF p = Ker p and Ker 7^ PEq) Let B be a transitive UP-generic oracle. Then the proof of Lemmata 
15.31 and 15.51 go through. □ 
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Open Question 5.6. Does CF = Ker imply P = NP? Or is there an oracle relative to which 
CF = Ker but nonetheless P ^ NP? Further, is there an oracle relative to which P 7^ NP but 
CF = Ker = PEq? 

Open Question 5.7. Is there an oracle relative to which CF ^ Ker = PEq? 

6. Future Work 

Here we present several directions for future work, in addition to the open problems mentioned 
throughout the paper. In no particular order: 

• Is LEq contained in CF(FL NL )? Is it contained in CF(FP)? In Ker(FP)? We note that the 
straightforward binary search technique used to show PEq C LexEqFP NP does not work in 
logarithmic space. Jenner and Toran [JT97] showed that the lexicographically minimal (or 
maximal - in this case the same technique works) solution of any NL search problem can be 
computed in FL NL . However, the notion of an NL search problem is based on the following 
characterization of NL due to Lange [Lan86]: a language A is in NL iff there is a logspace 
machine M(x,y) that reads is second input in one direction only, indicated by "y" , such 
that 

x£ A (3Py)[M(x,y) = l] 

Without the one-way restriction, this definition would give a characterization of NP rather 
than NL. An NL search problem is then: given such a machine M and input x, find a y 
such that M(x,y) = 1. Any equivalence relation that can be decided by such a machine is 
in LexEqFL NL , but it is not clear that this captures all of LEq. 

• Does CF(FL) = Ker(FL) imply NL = UL? Note that NL = UL iff FL NL C [A"J93] . 

• Does CF(FL) = LEq imply UL C RL? A positive answer to this question and the previous 
one would give very strong evidence that CF(FL) ^ LEq, as significant progress has been 
made towards showing L = RL |RTV05| . 

• Study expected polynomial-time canonical forms. If every R G Ker(FP) has an expected 
polynomial-time canonical form, does PH collapse? 

• Find a class of groups for which the group membership problem is in P but no efficient 
complete invariant is known for the subgroup equivalence problem (see Section 14.2. 3p . 

• If Ker = PEq, does PH collapse? 

• LexEqFP ZiP = CF(FP LiP ) = Ker(FP LiP ) = P LiP Eq. If Ker(FP LiP ) = P LiP Eq does PH col- 
lapse? 

• Study counting classes of equivalence relations. For an equivalence relation R, the associated 
counting function is f{x) = #[x]r. 

• Study complexity classes of lattices, partial orders, and total orders. 
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